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Abstract 

The problem of constructing confidence sets that are adaptive in L^-loss over a 
continuous scale of Sobolev classes of probability densities is considered. Adaptation 
holds, where possible, with respect to both the radius of the Sobolev ball and its 
smoothness degree, and over maximal parameter spaces for which adaptation is 
possible. Two key regimes of parameter constellations are identified: one where 
full adaptation is possible, and one where adaptation requires critical regions be 
removed. Techniques used to derive these results include a general nonparametric 
minimax test for infinite-dimensional null- and alternative hypotheses, and new lower 
bounds for L^-adaptive confidence sets. 

1 Introduction 

The paradigm of adaptive nonparametric inference has developed a fairly com plet e the- 
ory for estimation and testing - we mention the key references ji^, [3, [ia, 0, S, - but 
the theory of adaptive confidence statements has not succeeded to the same extent, and 
consists in a significant part of negative results that are in a somewhat puzzling contrast 
to the fact that adaptive estimators exist. The topic of confidence sets is, however, of 
vital importance, since it addresses the question of whether the accuracy of adaptive 
estimation can itself be estimated, and to what extent the abundance of adaptive risk 
bounds and oracle inequalities in the literature are useful for statistical inference. 

In this article we give a set of necessary and sufficient conditions for when confi- 
dence sets that adapt to unknown smoothness in L-^-diameter exist in the problem of 
nonparametric density estimation on [0, 1]. The scope of our techniques extends without 
difficulty to density estimation on the real line, and also to other common function esti- 
mation problems such as nonparametric regression or Gaussian white noise. Our focus 
on L-^-type confidence sets is motivated by the fact that they involve the most commonly 
used loss function in adaptive estimation problems, and so deserve special attention in 
the theory of adaptive inference. 
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We can illustrate some main ideas by the simple example of two fixed Sobolev-type 
classes. Let Xi, . . . , Xn be i.i.d. with common probability density / contained in the 
space L2 of square-integrable functions on [0,1]. Let S(r) = Ti{r,B) be a Sobolev ball 
of probability densities on [0,1], of Sobolev-norm radius B - see Section [2] for precise 
definitions - and consider adaptation to the submodel C 5](r), s > r. An adaptive 
estimator exists, achieving the optimal rate n"^/^^^"'"^-' for / E S(s) and n~'''^^'^^~^^^ 
otherwise, in L^-risk; see for instance Theorem [2] below. 

A confidence set is a random subset C„ = C(Xi,...,X„) of L^. Define the L^- 
diameter of a norm-bounded subset C of as 

|C| = inf {r : C7 C {/i : - gh < r} for some g £ L'^} , (1) 

equal to the radius of the smallest L^-ball containing C. For G d LP' set ||/ — G||2 = 
infggG 11/ — 5112 and define, for Pn > a sequence of real numbers, the separated sets 

t{r,pn) = t{r,s,B,pn) = {/ G S(r) : ||/ - S(s)||2 > M- 

Obviously S(r, 0) = but for pn > these sets are proper subsets of S(r) \ S(s). 

We are interested in adaptive inference in the model 

P„ = S(s)US(r,p„) 

under minimal assumptions on the size of pn- We shall say that the confidence set C„ is 
L^-adaptive and honest for Vn if there exists a constant M such that for every n S N, 

sup Pr/ ||C„| > Mn-^/(2.+i)| < ^/^ (2) 
sup Pr/ > Mn-''/(2^+^)| < a' (3) 

/eS{r,p„) 

and if 

inf Pr/{/G C„} > l-a-r„ (4) 

where r„ — )• as n — )• oo. We regard the constants a, a' as given 'significance levels'. 
Theorem 1. Let < a,a' < 1, s > r > 1/2 and B > 1 be given. 

A) An L^-adaptive and honest confidence set for X](r, p„) U exists if one of the 
following conditions is satisfied: 

i) s < 2r and Pn ^ 
a) s > 2r and 

Pn > Mn~'-/(2^'+l/2) 

for every n G N and some constant M that depends on a,a',r,B. 

B) If s > 2r and Cn is an L'^ -adaptive and honest confidence set for T,{r,pn) US(s), for 
every a, a' > 0, then necessarily 

liminf p„n''/(2'-+i/2) > g. 

n 
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We note first that for s <2r adaptive confidence sets exist without any additional 
restrictions - this is a main finding of the papers [21. 6..i28i] and has important precursors 
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24l . Il6l . It is based on the idea that under the general assumption / G i;(r) we 
may estimate the L^-risk of any adaptive estimator of / at precision n"^/*^^''^^/^'' which 
is 0(n~*/(^*+-^)) precisely when s < 2r. As soon as one wishes to adapt to smoothness 
s > 2r, however, this cannot be used anymore, and adaptive confidence sets then require 
separation of and T,{r) \ (i.e., p„ > 0). Maximal subsets of S(r) over which 
L^-adaptive confidence sets do exist in the case s > 2r are given in Theorem [H with 
separation sequence pn characterised by the asymptotic order n~^^^'^^^^^'^\ This rate 
has, as we show in this article, a fundamental interpretation as the minimax rate of 
testing between the composite hypotheses 

Ho:f£^is) against Hi : f £ t{r, pn). (5) 



The occurrence of this rate in Theorem [T] parallels similar findings in Theorem 2 in IT]] 
in the different situation of confidence bands, and is inspired by the general ideas in 
[lil . 17, [23, [Hi) which attempt to find 'maximal' subsets of the usual parameter spaces 



of adaptive estimation for which honest confidence statements can be constructed. Our 
results can be construed as saying that for s > 2r confidence sets that are L^-adaptive 
exist precisely over those subsets of the parameter space S(r) for which the target s of 
adaptation is testable in a minimax way. 

Our solution of ([5]) is achieved in Proposition [2] below, where we construct consistent 
tests for general composite problems of the kind 

Ho-.fe^ against Hi : f G S(r), ||/ - S||2 > />„, E C S(r), 

whenever the sequence pn is at least of the order max^Ti ' ^ , T^i) , where v^i is 

related to the complexity of S by an entropy condition. In the case S = with 
s > 2r relevant here we can establish = n"*/^^*^^^ = o{n"^'^^'^^~^^^'^^), so that this test 



is minimax in light of lower bounds in [19|, |2C 

While the case of two fixed smoothness classes in Theorem [1] is appealing in its 
conceptual simplicity, it does not describe the typical adaptation problem, where one 
wants to adapt to a continuous smoothness parameter s in a window [r, R] . Moreover 
the radius B of is, unlike in Theorem [H typically unknown, and the usual practise 
of 'undersmoothing' to deal with this problem incurs a rate-penalty for adaptation that 
we wish to avoid here. Instead, we shall address the question of simultaneous exact 
adaptation to the radius B and to the smoothness s. We first show that such strong 
adaptation is possible if -R < 2r, see Theorem[3l In the general case R>2r we can use the 
ideas from Theorem [T] as follows: starting from a fixed largest model S(r, Bq) with r, Bq 
known, we discretise [r, R] into a finite grid S consisting of progressions r, 2r, 4r, . . . , 
and then use the minimax test for ^ in an iterated way to select the optimal value 
in S. We then use the methods underlying Theorem [1] Ai) in the selected window, and 
show that this gives honest adaptive confidence sets over 'maximal' parameter subspaces 
Vn C T,{r,Bo). In contrast to what is possible in the L°°-situation studied in [H], the sets 
Vn asymptotically contain all of S(r, i?o)) highlighting yet another difference between 
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the LP'- and L°°-theory. See Proposition [T] and Theorem O below for details. We also 
present a new lower bound which implies that for R > 2r even 'pointwise in /' inference 
is impossible for the full parameter space of probability densities in the r-Sobolev space, 
see Theorem m In other words, even asymptotically one has to remove certain subsets 
of the maximal parameter space if one wants to construct confidence sets that adapt 
to arbitrary smoothness degrees. One way to remove is to restrict the space apriori to 
a fixed ball S(r, i?o) of known radius as discussed above, but other assumptions come 
to mind, such as 'self-similarity' conditions employed in 27, ll^ . [jj . 0] for confidence 
intervals and bands. We discuss briefly how this applies in the L^-setting. 

We state all main results other than Theorem [T] above in Sections [2] and El and proofs 
are given, in a unified way, in Section [3] 



2 The Setting 

2.1 Wavelets and Sobolev-Besov Spaces 

Denote by := ^^([0, 1]) the Lebes gue space of square integrable functions on [0,1], 
normed by || • ||2. For integer s the classical Sobolev spaces are defined as the spaces of 
functions f ^ whose (distributional) derivatives Z)"/, < a < s, all lie in L^. One 
can describe these spaces, for s > any real number, in terms of the natural sequence 
space isometry of under an orthonormal basis. We opt here to work with wavelet 
bases: for index sets Z C Zi C Z and Jq G N, let 

{4>Joni, i/Jik ■■ in £ Z,k £ ZiJ > Jo + 1,1 £ N} 

be a compactly supported orthonormal wavelet basis of of regularity S", where as 
usual, V/fc = 2^/^'0A:(2'O- We shall only consider Cohen-Daubechies-Vial [Tl] wavelet 
bases where \Zi\ = 2\\Z\ < c{S) < oo, Jq = Jo{S). We define, for {f,g) = Jq fg the 
usual L^-inner product, and for < s < S*, the Sobolev (-type) norms 



max 2-^0^ /j;(/,,/.j„fc)2, sup 2'^ /^^(/.V', 



kez l>Jo+i w f,^^^ 



max 



2^«^|K/,0j.,)||2, sup 2'^\\{f,4.i.)h) (6) 



where in slight abuse of notation we use the symbol || • ||2 for the sequence norms on 
i'^ (Zi) , i'^ (Z) as well as for the usual norm on L^. Define moreover the Sobolev (-type) 
spaces 

W = Bl^ = {f£L': \\f\\s,2 < oo}. 

We note here that is not the classical Sobolev space - in this case the supremum over 
I > Jo + ^ would have to be replaced by summation over / - but the present definition 
gives rise to the slightly larger Besov space -B|oo! which will turn out to be the natural 
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exhaustive class for our results below. We still refer to them as Sobolev spaces for 
simplicity, and since the main idea is to measure smoothness in L?. We understand 
as spaces of continuous functions whenever s > 1/2 (possible by standard embedding 
theorems). We shall moreover set, in abuse of notation, (fij^k = V'Jofc (which does not 
equal 2~-'^/^^/;jo_|_i ^(2"^-)) in order for the wavelet series of a function / G to have the 
compact representation 

oo 

l=Jo kGZ[ 

with the understanding that Zj^ = Z. The wavelet projection Ily^. (/) of / G onto 
the span Vj in of 

{(j)Jom, ipik ■ m £ Z,k £ Zi,Jo + l < I < j} 

equals 

K,(/)(x)= / Kj{x,y)fiy)dy^2^ i^(2^x, 2^2/)/(y)dy = V V (/, V^,fc)Vifc(x) 
■^0 -^0 i=j,kez, 

where K{x,y) = J2k ^Jok{x)4'Jok{y) is the wavelet projection kernel. 



2.2 Adaptive Estimation in 

Let Xi, . . . , Xn be i.i.d. with common density / on [0, 1], with joint distribution equal to 
the first n coordinate projections of the infinite product probability measure Prj. Write 
Ef for the corresponding expectation operator. We shall throughout make the minimal 
assumption that / G for some r > 1/2, which implies in particular, by Sobolev's 
lemma, that / is continuous and bounded on [0, 1]. The adaptation problem arises from 
the hope that / G for some s significantly larger than r, without wanting to commit 
to a particular a priori value of s. In this generality the problem is still not meaningful, 
since the regularity of / is not only described by containment in W^, but also by the size 
of the Sobolev norm ||/||s,2- If one defines, for < s < oo, 1 < B < oo, the Sobolev-balls 
of densities 

^{s,B) := |/ : [0,1] ^ [0,oo),^/ = 1, ||/|U,2 < i?} , (7) 
then Pinsker's minimax theorem (for density estimation) gives, as n — )• oo, 

inf sup i?/||T„-/||2~c(s)i?2/(2^+i)„-2«/{2«+i) (8) 

/eE{s,B) 

for some constant c{s) > depending only on s, and where the infimum extends over 
all measurable functions T„ of Xi, . . . ,Xn (cf., e.g., the results in Theorem 5.1 in (l(j|). 
So any risk bound, attainable uniformly for elements / G S(s,B), cannot improve on 

to multiplicative constants. If s, B are known then constructing 
estimators that attain this bound is possible, even with the asymptotically exact constant 
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c{s). The adaptation problem poses the question of whether estimators can attain such 
a risk bound without requiring knowledge of B, s. 

The paradigm of adaptive estimation has provided us with a positive answer to this 
problem, and one can prove the following result. 

Theorem 2. Let \/2 < r < R < co he given. Then there exists an estimator fn = 
f{Xi, . . . , Xn, r, R) such that, for every s £ [r, R], every B > 1,U > 0, and every n G N, 

sup EfWU - fWl < ci?2/(2s+l)„-2s/(2s+l) 

fm^,B),\\f\\^<U 

for a constant < c < oo that depends only on r, R, U . 

If one wishes to adapt to the radius i? € [1, Bq] then the canonical choice for U is 

sup ll/lloo < c(r)So = ?7 < oo, (9) 

/eE(r,Bo) 

but other choices will be possible below. More elaborate techniques allow for c to depend 
only on s, and even to obtain the exact asymptotic minimax 'Pinsker'-constant, see for 
instance Theorem 5.1 in 10(]. We shall not study exact constants here, mostly to simplify 
the exposition and to focus on the main problem of confidence statements, but also since 
exact constants are asymptotic in nature and we prefer to give nonasymptotic bounds. 

From a 'pointwise in /' perspective we can conclude from Theorem [2] that adaptive 
estimation is possible over the full continuous Sobolev scale 

U = VT'- n |/ : [0, 1] ^ [0, oo), r / = i| ; 

for any probability density / G , s G [r, i?] , the single estimator /„ satisfies 



Ef\\fn-f\\l<C 



2/(2s+l)^-2s/(2^!+l) 



where c depends on r, i?, ||/||oo- Since /„ does not depend on B,U or s we can say 
that /„ adapts to both s € [r,R] and B £ [1,-Bo] simultaneously. If one imposes an 
upper bound on U then adaptation even holds for every B > 1. Our interest here is 
to understand what remains of this remarkable result if one is interested in adaptive 
confidence statements rather than in risk bounds. 



3 Adaptive Confidence Sets for Sobolev Classes 
3.1 Honest Asymptotic Inference 

We aim to characterise those sets Vn consisting of uniformly bounded probability den- 
sities / G for which we can construct adaptive confidence sets. More precisely, we 
seek random subsets Cn of that depend only on known quantities, cover / G Vn at 
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least with prescribed probability 1 — a, and have L^-diameter |C„| adaptive with respect 
to radius and smoothness with prescribed probability at least 1 — a'. To avoid discussing 
measurability issues we shall tacitly assume throughout that Cn lies within an L^-ball 
of radius 0(|Cn|) centered at a random variable /„ G L^. 

Definition 1 (L^-adaptive confidence sets). Let Xi, . . . ,X„ be i.i.d. on [0, 1] with com- 
mon density f. Let < a,a' < 1 and 1/2 < r < R be given and let Cn = C{Xi, . . . , X^) 
be a random subset of L^ . Cn is called L'^ -adaptive and honest for a sequence of 
(nonempty) models Vn C W^ri{f : ||/||oo < U}, if there exists a constant L = L(r,R,U) 
such that for every n G N 

sup Pr/ {\Cn\ > L5i/(2^+i)^-s/{2s+i)| < ^/ g^g^^ g g [r,R],B > 1, (10) 

(the condition being void ifTi{s,B) OVn is empty) and 

mf^Prf{fGCn}>l-a-rn (11) 

where — )■ as n —)• oo . 

To understand the scope of this definition some discussion is necessary. First, the 
interval [r, R] describes the range of smoothness parameters one wants to adapt to. 
Besides the restriction l/2<r<i?<oo the choice of this window of adaptation is 
arbitrary (although the values of R,r infiuence the constants). Second, if we wish to 
adapt to B in a fixed interval [1,-Bo] only, we may take Vn a subset of S(r, i?o) and 
the canonical choice of U = c{r)BQ from Q. In such a situation (jlOp will still hold 
for every B > 1 although the result will not be meaningful for B > Bq. Otherwise we 
may impose an arbitrary uniform bound on ||/||oo and adapt to all -B > 1. We require 
here the sharp dependence on B in (fTOjl and thus exclude the usual 'undersmoothed', 
near-adaptive, confidence sets in our setting. A natural 'maximal' model choice would 
be Vn = 5](r, Bq) Vn with Bq > 1 arbitrary. 



3.2 The Case R < 2r. 



A first result, the key elements of which have been discovered and discussed in 24l.ll6l.l2ll. 
0,(2^, is that L-^-adaptive confidence statements that parallel the situation of Theorem 
[2] exist without any additional restrictions whatsoever, in the case where R < 2r, so 
that the window of adaptation is [r, 2r). The sufficiency part of the following theorem 
is a simple extension of results in Robins and van der Vaart [28] in that it shows that 
adaptation is possible not only to the smoothness s, but also to the radius B. The main 
idea of the proof is that, if i? < 2r, the squared L^-risk of /„ from Theorem [2] can be 
estimated at a rate compatible with adaptation, by a suitable [/-statistic. 

Theorem 3. A) If R < 2r, then for any a^a' , there exists a confidence set Cn = 
C{Xi, . . . , Xn,r, R, a, a') which is honest and adaptive in the sense of DefinitionU\for 
any choice Vn = ^{r,Bo) n {/ : ||/||oo < U}, Bo>l,U>0. 
B) If R> 2r, then for a, a' small enough no Cn as in A) exists. 
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We emphasise that the confidence set C„ constructed in the proof of Theorem [3] does 
only depend on r, R, a, a' and does not require knowledge of Bq or U. Note however that 
the sequence from Definition [T1 does depend on Bq — one may thus use Cn without any 
prior choice of parameters, but evaluation of its coverage is still relative to the model 
E(r, i?o)- Arbitrariness of Bo,U implies, by taking Bq = ||/||s,2;C^ = ||/||oo in the above 
result, that 'pointwise in /' adaptive inference is possible for any probability density in 
the Sobolev space W^. 

Corollary 1. Let < a,a' < 1 and 1/2 < r < R. Assume R < 2r. There exists a 
confidence set Cn = C{Xi, . . . , X„, r, R, a, a') such that 

i) liminf„Prj {/ G C„} > 1 — a for every probability density f G W^, and 

ii) limsup^ Prj{|C„| > L||/||y2^*^^''f^~*^''^*^"^^} < ct' for every probability density f S 
W^, s S [r, R], and some finite positive constant L = L(r,R, ||/||oo)- 

3.3 The Case of General R 

If we allow for general R > 2r honest inference is not possible without restricting Vn 
further. In fact even a weaker 'pointwise in /' result of the kind of Corollary [1] is 
impossible for general R> r. This is a consequence of the following lower bound. 

Theorem 4. Fix < a < 1/2, let s > r be arbitrary. A confidence set Cn = 
C{Xi, . . . , Xn) in cannot satisfy 

i) liminf„Prj{/ € C„} > 1 — a for every probability density f G W"^ , and 
a) \Cn\ = Oprf{i"n) for evcry probability density f £ 
at any rate r„ = o(n~''/(^'"+^/^)). 

For R > 2r we have 7),-^/{2^?+i) = o{n~'^^^'^^^^^'^^). Thus even from a 'pointwise in /' 
perspective a confidence procedure cannot adapt to the entirety of densities in a Sobolev 
space when R > 2r. On the other hand if we restrict to proper subsets of W^, the 
situation may qualitatively change. For instance if we wish to adapt to submodels of a 
fixed Sobolev ball S(r, Bq) with r, Bq known, we have the following result. 

Proposition 1. Let < Q,a' < 1 and 1/2 < r < R, Bq > 1. There exists a confidence 
set Cn = C{Xi, . . . , Xn,BQ,r, R, a, a') such that 

i) liminf„Prj {/ G C„} > 1 — a for every probability density f G S(r, Sq); o.iT'd 

ii) limsup„ Prj{|C„| > L||/||y2^*^^''^^^*^'-^^^^^} ^ ct' for every probability density f G 
T,{s, Bq), s G [r,R], and some finite positive constant L = L{r,R, ||/||oo)- 

Now if we compare Proposition [1] to Theorem [3] we see that there exists a genuine 
discrepancy between honest and pointwise in / adaptive confidence sets when R > 2r. 
Of course Proposition [T] is not useful for statistical inference as the index n from when 
onwards coverage holds depends on the unknown /. The question arises whether there 
are meaningful maximal subsets of S(r, i?o) for which honest inference is possible. The 
proof of Proposition [1] is in fact based on the construction of subsets Vn of S(r, i?o) 
which grow dense in S(r, i?o) and for which honest inference is possible. This approach 
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follows the ideas from Part Aii) in Theorem [U and works as follows in the setting of 
continuous s S [r, R]: assume without loss of generality that 2(A^ — l)r < R < 2Nr for 
some G N, > 1, and define the grid 



•S — {Srn} 



N 

m=l 



{r, 2r,4r, . . . ,2{N - l)r}. 



Note that S is independent of n. Define, for s £ S \ {sn} 



p) := Bo, S, p) = {/ G Bo) :\\f- Sq) lb > P Vt > s, t G 5} . 



We will choose the separation rates 



Pn[s 



n 



s/(2s+l/2) 



equal to the minimax rate of testing between T,{s,Bo) and any submodel S(t,i?o) for 
t £ S ,t > s. The resulting model is therefore, for M some positive constant. 



\sG5\{sjv} / 

The main idea behind the following theorem is to first construct a minimax test for 
the nested hypotheses 



then to estimate the risk of the adaptive estimator /„, from Theorem [2] under the as- 
sumption that / belongs to smoothness hypothesis selected by the test, and to finally 
construct a confidence set centered at fn based on this risk estimate (as in the proof of 
Theorem [3]) . 

Theorem 5. Let R > 2r and Bq > 1 be arbitrary. There exists a confidence set 
Cn = C{Xi, . . . , Xn, Bo,r, R, a, a'), honest and adaptive in the sense of DefinitionUl for 
Vn = 'PniM,S),n G N, with M a large enough constant and U as in 

First note that, since S is independent of n, Vn{M,S) S(r, Bo) as n — )• oo, so that 
the model Vn{M,S) grows dense in the fixed Sobolev ball, which for known Bq is the 
full model. This implies in particular Proposition [TJ 

An important question is whether VniM,S) was taken to grow as fast as possible as 
a function of n, or in other words, whether a smaller choice of Pnis) would have been 
possible. The lower bound in Theorem [T] implies that any faster choice for Pn{s) makes 
honest inference impossible. Indeed, if Cn is an honest confidence set over P„(M, 5) 
with a faster separation rate p^ = o{pn{s)) for some s £ S \ {sjy}, then we can use Cn 
to test Ho : / G S(s') against Hi : f £ Ti{s,Pn) for some s' > 2s, which by the proof of 
Theorem [1] gives a contradiction. 




{Hs-. f £^is,Mpn{s))}s^S\{sr,} 
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3.3.1 Self-Similarity Conditions 



The proof of Theorem [5] via testing smoothness hypotheses is strongly tied to knowledge 
of the upper bound Bq for the radius of the Sobolev ball, but as discussed above, this 
cannot be avoided without contradicting Theorem [H Alternative ways to restrict W, 
other than constraining the radius, and which may be practically relevant, are given in 
27 . isl . 23)[H|- The authors instead restrict to 'self-similar' functions, whose regularity is 



similar at large and small scales. As the results 1^, 2^, ^ prove adaptation in L°°, they 
naturally imply adaptation also in L^; the functions excluded, however, are now those 
whose norm is hard to estimate, rather than those whose norm is merely large. In the 
L^-case we need to estimate s only up to a small constant; as this is more favourable 
than the L°°-situation, one may impose weaker self-similarity assumptions, tailored to 
the L^-situation. This can be achieved arguing in a similar fashion to Bull fsl], but we 
do not pursue this further in the present paper. 



4 Proofs 

4.1 Some Concentration Inequalities 

Let Xi,i = 1,2, ... ,he the coordinates of the product probability space (T, T, P)^, where 
P is any probability measure on (T, T), Pn = Y^^=i the empirical measure, E ex- 
pectation under P^ = Pr. For M any set and : M — M, set ||-?^||m = sup^gj^.^ \H{m)\. 
We also write Pf = Jj, fdP for measurable / : T — )• M. 

The following Bernstein-type inequality for canonical [/-statistics of order two is due 



to Gine, Latala and Zinn Ij], with refinements about the numerical constants in Houdre 
and Reynaud-Bouret [l^: let R{x,y) be a symmetric real- valued function defined on 
T X T, such that ER{X, x) = for all x, and let 

A2 = nsup{i^[i?(Xi,X2)C(^i)e(^2)] : E^iXi) < l,Ef{Xi) < 1}, 

A3 = ||nEi?2(Xi,-)||^', A4 = ||i?||oo. 

Let moreover Un\R) = ^i<j ^i-^i^-^j) be the corresponding degenerate U- 

statistic of order two. Then, there exists a universal constant < C < oo such that for 
all > and n G N: 

f ^(^ -l) |^P(^)| y C{Aiu^/^ + A2U + Asu^/^ + AiU^)] <6exp{-u}. (12) 



We will also need Talagrand's [30[ inequality for empirical processes. Let be a 
countable class of measurable functions on T that take values in [—1/2,1/2], or, if 
is P-centered, in [—1, 1]. Let a < 1/2, or cj < 1 if is P-centered, and V be any two 
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numbers satisfying 



(y'^>\\Pf\\T, V>na^ + 2E 



- Pf) 

i=l 

Bousquet's [^] version of Talagrand's inequality then states: for every u > 0, 



Pr 



^(/(X,) - Pf) 



i=l 



> E 



j2ifiXi)-pf) 



i=l 



+ n > < exp 



2V + lu 



(13) 



A consequence of this inequahty, derived in Section 3.1 in [15], is the fohowing. If 
T = [0,1], P has bounded Lebesgue density / on T, and fn{j) = Kj{-,y)dPn{y), then 
for M large enough, every j > 0, n G N and some positive constants c, c' depending on 
U and the wavelet regularity S, 



sup Pry<^ ||/„(j)-^/n(j)|l2 >M 

f-\\f\\oo<U I 



n 



(14) 



4.2 A General Purpose Test for Composite Nonparametric Hypotheses 

In this subsection we construct a general test for composite nonparametric null hy- 
potheses that lie in a fixed Sobolev ball, under assumptions only on the entropy of the 
null-model. While of independent interest, the result will be a key step in the proofs of 
Theorems [T] and O 

Let X, Xi, . . . , Xn be i.i.d. with common probability density / on [0, 1], let S be any 
subset of a fixed Sobolev ball Ti{t,B) for some t > 1/2 and consider testing 



: / G S against Hi : f G S(t, B) \ S, ||/ - S||2 > Pn, 



(15) 



where p„ > is a sequence of nonnegative real numbers. For {ipik} a S'-regular wavelet 
basis, S > t, Jn > Jq a sequence of positive integers such that 2"^" ~ 72i/(2t+i/2) ^^^^ £qj. 
g £ T,, define the [/-statistic 



Jn-i 



Tnig) 



n[n 



i<j l=Jo k€Zi 

and, for t„ some thresholds to be chosen below, the test statistic 



1 <j inf |T„(5t)| > Tn 



(17) 



Measurability of the infimum in (jl7p can be established by standard compactness/continuity 
arguments. 
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We shall prove a bound on the sum of the type-one and type-two errors of this test 
under some entropy conditions on S, more precisely, on the class of functions 

^(^) = U I E E i'ik{-){i'ik,9) : 5 e s i . 

J>Jo yi=Jok£Zi ) 

Recall the usual covering numbers N(e, G, (P)) and bracketing metric entropy numbers 
iVn (e, Q, L'^(P)) for classes Q of functions and probability measures P on [0,1] (e.g., 

mm)- 

Definition 2. Say that S is s-regular if one of the following conditions is satisfied for 
some fixed finite constants A and every Q < e < A: 

a) For any probability measure Q on [0,1] (and A independent of Q) we have 

log^^(e,g(S),L2(Q))<(A/6)l/^ 



b) For P such that dP = fdX with Lebesgue density f : [0, 1] — t- [0,cxd) we have 

logiV[](£,g(S),L2(P))<(A/e)l/^ 

Note that a ball Ti{s,B) satisfies this condition for the given s, 1/2 < s < S, since 
any element of Q^E^s, B)) has || • ||s,2-norm no more than B, and since 

logiV(e,S(s,i3),|| -lU) < (A/e)'/', 

see, e.g., p. 506 in 
Proposition 2. Let 

r. = Ld„max(n-2'^/(2«+^),n-2*/(2*+V2))^ ^2 ^ 

for real numbers 1 < < d(log n)"' and positive constants L, Lq, 7, d. Let the hypotheses 
Hq, Hi be as in [T5\} . the test o,s in p7\ ), and assume T, is s-regular for some s > 1/2. 
Then for L = L{B,t,S), Lq = Lq{L, B,t, S) large enough and every n G N there exist 
constants Ci,i = 1, . . . , 3 depending only on L, Lo,t, B such that 

sup Ef'^n + sup Ef{l - ^-n) < Cie~'^" + C2e-^3np2 _ 
f£Ho feHi 

The main idea of the proof is as follows: for the type-one errors our test-statistic 
is dominated by a degenerate [/-statistic which we can bound with inequality ()12p . 
carefully controlling the four regimes present. For the alternatives the test statistic can 
be decomposed into a degenerate [/-statistic which can be dealt with as before, and 
a linear part, which is the critical one. The latter can be compared to a ratio- type 
empirical process which we control by a slicing argument applied to S, combined with 
Talagrand's inequality. 
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Proof. 1) We first control the type-one errors. Since / G Hq = S we see 

Ef^n = Pr/ I mf \Tn{g)\ > rjj < Pry {|r„(/)| > r„} . (18) 
Tn{f) is a [/-statistic with kernel 

Jn-l 

Rf{x,y)= ^ ^{i^ik(.x)-{'il)ik,f)){'il^ik{y)-{ipikj)), 

l=Jo k£Zi 

which satisfies ERf{x,Xi) = for every x, since Ef{ipik{X) — {ipik,f)) = for every 
k,l. Consequently Tn{f) is a degenerate fJ-statistic of order two, and we can apply 
inequality (|12p to it, which we shall do with u = c(^. We thus need to bound the 
constants Ai, . . . , A4 occurring in inequality ([T2|) in such a way that, for L large enough. 



{Kldn + A2d^ + Agd^ + A44) < Ld„n-2*/(2*+V2) < (19) 



n{n — 1) 

which is achieved by the following estimates, noting that n^^*/(^*+^/^) ~ 2-'"/^/n. 

First, by standard [/-statistic arguments, we can bound ER'j{Xi, X2) by the second 
moment of the uncentred kernel, and thus, using orthonormality of ipik, 

ER}{Xi,X2) < J I (^i^ik{x)i^ikiy)^ f{x)f{y)dxdy 

< II/IIL E E / 'J'lk^^^dx / ^pUy)dy 
l=Jo kGZ, 



I 

< C{S)2-'-\\f\\l 

for some constant C{S) that depends only on the wavelet basis. We obtain A^ < 
C{S)n{n — 1)2"^" 11/11^/2 and it follows, using ([9]) that for L large enough and every 
n, 

^<c(s,B.t,^<W4. 

n[n — 1) n 

For the second term note that, using the Cauchy-Schwarz inequality and that Kj is a 
projection operator 

. Jn-l 



j j i'ikix)tpik{y)C{x)^iy)f{x)f{y)dxdy 



l = Jo k£Zi 



KjACf){yny)f{y)dy 

< Iii^j.(c/)ii2iie/ii2<"^"' 



and similarly 

\E[ExAKjAXi,X2)]aX^nX2)]\ < II/IIL, \EKj„{X,,X2)\ < \\f\\l. 
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Thus 

so that, using ([9]), 



E[Rf{Xi,X2)C{Xi)aX2)]<4: 
2Ck2dl ^ C'iB,t)dl 



n 



<r„/4 



n(n — 1) 

again for L large enough and every n. 

For the third term, using the decomposition Rf{xi,x) = {r{xi,x) — Exir{X, x)) + 
{Ex,Yi"iX:,Y) - EYr{xi,Y)) for r{x,y) = Yjk^iipik{x)'>Pik{y), the inequahty (a + 6)^ < 
2a^ + 26^ and again orthonormaHty, we have that for every x G M, 



n\Ex,R){Xi,x)\ < 2n 



Jn-l 



|ny,„(/)||^ 



so that, using H'i/'zfclloo < (i2'/^, again for L large enough and by ([9]), 

n(n — 1) n 
Finally, we have A4 = ||iif ||oo ^ c2"'" and hence 



n 



n[n — 1) ~ -n? 



< r„/4, 



so that we conclude for L large enough and every n G N, from inequality ([T2 

Pr^{|r„(/)| >r„,} <6exp{-d2} 



which completes the bound for the type-one errors in view of (jlSp . 
2) We now turn to the type-two errors. In this case, for / G Hi 

Ef{l - ^n) = Pr/ I inf \Tn{g)\ < r„| . 

and the typical summand of Tn^g) has Hoeffding-decomposition 

{ilJik{Xi) - {xpik,g)){'>Pik{Xj) - {'ilJik,g)) 

= ii^ikiXi) - {^iJik, f) + ii^ik, f - g)mk{Xj) - ii^ik, f) + ii'ikJ - g)) 

= i^PikiXi) - {i'lkjmikix,) - {^Pik,m 

+ ii^ikiXi) - {4^ik, f)){i'ik,f - g) + ii^ikiXj) - {i^k, f)){i^ik, f - g) 
+ {i^ikJ-g? 

so that by the triangle inequality, writing 

2 n J„-l 

= - E E E ^'I'lkiXi) - i^Pik, f)mkJ - g) 



(20) 



(21) 



(22) 



i=l l=Jo kdZi 
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for the linear terms, we conclude 

Jn-l 

\Tni9)\ > ^ 5^(V;fc,/-5>'-|r„(/)|-|L„(g)| 

l=Jo k&Zi 

= ||ny,j/-<7)||i-|r„(/)|-|L„(5)| (23) 

for every (7 E S. 

We can find random 17* G E such that inf^gs |Tn(5)| = \Tn{9n)\- (If infimum is not 
attained the proof below requires obvious modifications; for the case S = S(s, i?), s > t, 
relevant below, the infimum can be shown to be attained at a measurable minimiser by 
standard continuity and compactness arguments.) We bound the probability in (|2T]) . 
using ([23]), by 

P,, {|i„(,;)l > linv.„(/-^;)lli-r„ j ^ ^ l|n..„(/-i,;)|ii-r„ j 

Now by the standard approximation bound (cf. (I6|)) and since 5* G S C Il(t,i?), 

I|ny,„(/-5:)lli> inf ll/-5lli-c(i?)2-2^"*>4r„ (24) 

for Lq large enough depending only on B and the choice of L from above. We can thus 
bound the sum of the last two probabilities by 

Pr^{|L„(g:)| > ||ny,J/-5:)||i/4}+Pr^{|r„(/)| >t:4. 

For the second degenerate part the proof of Step 1 applies, as only boundedness of / 
was used there. In the linear part somewhat more care is necessary. We have 

Pr;{|L„(g:)| > llny.J/ - gDWlm < Pr^ (sup ^^/^^ > \] . (25) 



Note that the variance of the linear process from ()22p can be bounded, for fixed g G S, 
using independence and orthonormality, by 

Varf{\K{g)\) < ^ / ( 1^ 5^ V'/fc(x)(Vife,/-5)) f{^)dx 
^ \l=Jo k&Zi ) 

l=Jo k&Zi 
< 4||/||oo||ny,J/-g)||i 

~ n 

so that the supremum in ()25p is one of a self-normalised ratio-type empirical process. 
Such processes can be controlled by slicing the supremum into shells of almost constant 
variance, cf. Section 5 in 31] or [lit]. Define, for g' G S, 

<yHa) ■■= hvj^ {f-9)\\l>\\f-9f2- c{B)2~^'"' > cpl, 
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the inequality holding for Lq large enough and some c > 0, as in (|24|) . Define moreover, 
for m G Z, the class of functions 



Qm,Jn 



Jn-l 



which is uniformly bounded by a constant multiple of ||/||t,2 + sup£,g2(t,_B) \\9\\t,2 < 2i? 
in view of Q and since t > 1/2. Then clearly, in the notation of Subsection 14. H 

sup \Ln{g)\ = \\Pn - P\\g^,,j„ 

ggS:o-2(g)<2™+l 



and we bound the last probability in ()25p by 

Prf < max sup — „. . > - > 

[meZ:cV2<2-<CggE:2'"<a2(g)<2-+i 0" (ff) 4 J 

< E sup |L„(ff)|>2— 4 (27) 

< E P'fiWPn - - i?||Pn - P||g„„,„ > 2™-2 - E\\P^ - 
mGZ:c'p2<2"i<C 

where we may take C < c« as S C T,{t,B) is bounded in L^, and where c' is a positive 
constant such that c'pn < 2™ < c/?^ for some m £ Z. We bound the expectation of 
the empirical process. Both the uniform and the bracketing entropy condition for G{Ti) 
carry over to Uj>o^j,m since translation by / preserves the entropy. Using the standard 
entropy-bound plus chaining moment inequality (3.5) in Theorem 3.1 in ^] in case a) 
of Definition [21 and the second bracketing entropy moment inequality in Theorem 2.14.2 
in in case b), together with the variance bound ([26]) and with ([9]), we deduce 

om (r^m\-l/2s\ 




We see that 



- P\\q^,„ < C { ^/-(2-)-V- + ^ . (28) 



2"^-^ - E\\Pn - P\\g, > co^' 



1 1 



for some fixed cq precisely when 2™ is of larger magnitude than (2™) 2 4s n 1/2 -j_ 
(2™)-i/2s^-i^ equivalent to 2"* > c"n-2^/(2^+i) for some c" > 0, which is satisfied 
since 2™ > c'p"^ > c"n~'^^/^'^^^^'> if Lq is large enough, by hypothesis on pn- We can thus 
rewrite the last probability in ()27p as 



Pry {n||P„ - - nE\\Pn - > con2-} . 

meZ:c'p^<2™<C 

To this expression we can apply Talagrand's inequality (fT3]) . noting that the supremum 
over Gm,,j„ can be realised, by continuity, as one over a countable subset of E, and since 
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S is uniformly bounded by supjg2{t,B) ll/lloo 1^ U = U{t,B). Renormalising by U and 
using (fT3j) . ([26]) . (f28|) we can bound the expression in the last display, up to multiplicative 
constants, by 

y exp|-ci rr^^-^ 1 < T e''^^^^™ 

V I n2"^ + nEllPn - P\\c , + nl"" j " V 



since 2™ > c'/?^ >> n ^, which completes the proof. □ 
4.3 Proof of Theorem [2] 

Proof. We construct a standard Lepski type estimator: choose integers jmirnimax such 

that Jq ^ jmin ^ Jmax; 

2imin ^l/(2-R+l) g^j^^ 2-'"i^^ ~ ^^i/CSf + i) 

and define the grid 

^ • — i7n — [j'mini Jmax] 1^ N. 

Let fn{j) = fn{j, •) = Jq Kj{-,y)dPn{y) be a linear wavelet estimator based on wavelets 
of regularity S > R. To simplify the exposition we prove the result for ||/||oo known, 
otherwise the result follows from the same proof, with ||/||oo replaced by ||/n(imax)||oo, a 
consistent estimator for ||/||oo that satisfies sufficiently tight uniform exponential error 
bounds (using inequality (26) in (l5|] and proceeding as in Step (II) on p. 1157 in [14]). 
Set 

j„=min|jG J: ||/„(j)-/,(/)||2<C(S)(||/||ooVl)^ V/>j,/gj| (29) 

where C{S) is a large enough constant, to be chosen below, in dependence of the wavelet 
basis. The adaptive estimator is /„ = fnijn)- We shall need the standard estimates 

EWfnU) - EfniMl < D- := Da\j,n) (30) 

n 

and, for / G W\ s £ [r, R], 

\\Efn{j) - fh < 2-^''^'ll/ll.,2 := B{j, f) (31) 

for constants D, D' that depend only on the wavelet basis and on r, R. Define j* := j*{f) 
by 

j* = min [jej: B{j, f) < VDa{j, n)} 
so that, for every / G Y.{s,B) and D" = D"{D,D') 
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We will consider the cases {jn < J*} and {j„ > j'*} separately. First, by the definition 
oijn,f and dSni), dSIl), (132]), 



< C(5)(||/||oo V 1)— + C'a\j*,n) < c" B^'^'''+^^n-^'/^'+^ 
n 

for C" = C"{D, D' , S, U), which is the desired bound. On the event {jn > j*} we have, 
using (|30p and the definition of j*, 

E\\fnQn)-fhI^^^^r} ^ E {E\\Uj)-f\\iy^" Ki„=.})'^' 



< Yl C"'a{j,n).^JPrf{J^=J} 



jej':j>i' 

since svipj^j a{j,n) = (T(jmax;'^) is bounded in n. Now pick any j £ J so that j > j* 
and denote by j~ the previous element in the grid (i.e. j~ = j — 1). One has, by 
definition of jn, 



pr/a = i}< Yl (ll/n(r) - mw, > yc(s)(ii/iioo V 1)^1 

l£j:l>j I ) 



(33) 



■■i>j 

and we observe that, by the triangle inequality, 

WfniD - /n(0||2 < Wfnir) " fn{l) " i?/n(r) + Efn{l)\\^ + B{j~ J) + B{1, /), 

where, 

B{j-,f) + B{lJ) < 2B{j\f) < ca{j\n) < c'a{l,n) 

by definition of j* and since / > j^ > j*. Consequently, the probability in ()33p is 
bounded by 



Pr 



{||/n(r ) - fn{l) - Efnij-) + Efn{l)\\, > ( ^^^(5) ( II / II oo V 1) - c')a(/,n)} , (34) 



and by inequality (jl4p above this probability is bounded by a constant multiple of e "^^^ 
if we choose C{S) large enough. This gives the overall bound 



l£j:l>j 

which is smaller than a constant multiple times i?i/(2«+i),^-s/{2s+i)^ uniformly in s € 
[r, i?], n E N and for B > 1, by definition of jmin- This completes the proof. □ 



mm 
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4.4 Proof of Theorem [3] 

Proof. A) Suppose for simplicity that the sample size is 2n, and split the sample into 
two halves with index sets 3^,3"^, of equal size n, write Ei,E2 for the corresponding 
expectations, and E = E1E2. Let /„ = fn{jn) be the adaptive estimator from the 
proof of Theorem [2] based on the sample . One shows by a standard bias- variance 
decomposition, using j„ G J and ||i^j(/)||r,2 < ll/l|r,25 that for every e > there exists 
a finite positive constant B' = B'{e,BQ) satisfying 

inf Pr/{||/„||,,2<5'}> 

/eS(r,So) 

It therefore suffices to prove the theorem on the event {||/n||r,2 ^ B'}. For a wavelet basis 
of regularity S > R and for Jn > Jo a sequence of integers such that 2'^" ~ n^/('^^'+^/'^\ 
define the [/-statistic 

2 ^ 

Unifn) = TT V V V (V'/fc(^0 " (V'ifc, /n) ) (V'ifc (^i ) " (V'^fc, fn)) (35) 

n(n — 1 ^-^ 

i<j,i,j€S2 l=Jo keZi 

which has expectation 

Jn-l 

l=Jo k£Zi 

Using Chebychev's inequality and that, by definition of the norm ([6]) 

sup \\uv,jh) - h\\i < cm-^-'-^ 

/ieS(r,b) 

for every < 6 < cxd and some finite constant c(5), we deduce 

inf Pr^,2 \Unifn) - 11/ - fnWl > -i<Bo) + c{B'))2-^-'-' - z{a)Tn{f)] 

> inf Pr^,2{[/n(/n)-||nv:,„(/-/„)||i>-z(a)r„(/)} 

Var2{Un{fn)-E2Un{fn)) 

> 1 - sup — . 

We now show that the last quantity is greater than or equal to 1 — 2;(a)~^ > 1 — a for 
quantile constants z[a) and with 



^nU) = — —r — T\ ^ Z — Pyj„(j - Jnjib, 



c {s)2Hf\\io ^ m\ 

n{n — 1) n 
which in turn gives the honest confidence set under Pr 



Cn(||/||oo,i?0) = [f ■■ 11/ - fnh < \Jzarnif) + Unifn) + (c(i?o) + ciB'))2-^jA . 



(36) 
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We shall comment on the role of the constants \\f\\oo,c{BQ),C{B') at the end of the 
proof, and establish the last claim first: note that the Hoeffding decomposition for the 
centered [/-statistic with kernel 

Jn-i 

R{x,y)= ^ ^{'4'ik{x)-{i^ikJn)){'4^ik{y)-{i^ikJn)) 

l=Jo kGZi 



is (cf. the proof of Theorem 4.1 in |28l ]) 

Unifn) - E2Un{fn) = - V(vrii?)(X,) + . ^ ^. V (vTai?) (X„ X, ) = L„ + 

n ^-^ nin — 1 ^-^ 

i=l ^ ' i<j 

where 

Jn-l 

i=Jo k£Zi 

and 

Jn-l 

(7r2/?)(x,y) = ^ '^{i'lkix) - {i^ik, f)){'^ik{y) - {'il^ik, f)) 

l=Jo kGZi 

The variance of Un{fn) — E2Un{fn) is the sum of the variances of the two terms in the 
Hoeffding decomposition. For the linear term we bound the variance Var2{Ln) by the 
second moment, using orthonormality of the V'ZfcSj 



l=Jo k&Zi I l=Jo k(^Z, 



which equals the second term in the definition of t^(/). For the degenerate term we can 
bound Var2{Dn) analogously by the second moment of the uncentered kernel (cf. after 
(USD), i.e., by 

/(EE i^iki^)^ikiy)] f{x)dxf{y)dy < g('g)^^" 11/11^ ^ 
n(n-l)i \^^tl,^^ J n(n-l) 

using orthonormality and the cardinality properties of Zi. 

The so constructed confidence set has an adaptive expected maximal diameter: let 
/ G T,(s,B) for some s G [r,R] and some 1 < B < Bq. The nonrandom terms are of 
order 



Vc(So) + c(S02--^"" + ||/||^22Jn/4^-i/2 < c{S,Bo,B',r, U)n-''/^^'-+^/^'> 

which is o(n~'^/^^'^^^^) since s < R < 2r. The random component of Tn{f) has order 
U>'^n~^/^Ei\\Ilvj^(fn — /)||2^^ which is also o(n~*/(^*"'"^)) for s < 2r, since ^Vj„ is a 
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projection operator and since is adaptive, as established in Theorem [5J Moreover, by 
Theorem [2] and again the projection properties, 

EUnifn) = EiWUv.Jfn " f)\\l < El\\fn " f\\l < ci?2/(2^+l) ^-2^/(2^+1) . 

The term in the last display is the leading term in our bound for the diameter of the 
confidence set, and shows that (7„ adapts to both B and s in the sense of Definition [H 
using Markov's inequality. 

The confidence set C„(||/||oo, -Bq) is not feasible if Bq and ||/||oo are unknown, so in 
particular under the assumptions of Theorem O but Cn independent of Sq, ||/||oo can be 
constructed as follows: we replace c{Bo) + c{B') in the definition of (|36|) by a divergent 
sequence of positive real numbers c„, which can still be accommodated in the diameter 
estimate from the last paragraph since n~'^'^^^'^^~^^^'^^Cn is still o(n~2*/^2*"'"^^) as long as 
s < R < 2r for Cn diverging slowly enough (e.g., like logn). Define thus the confidence 
set 

= |/ : 11/ - fnh < \/zarn{f) + Un{fn) + C„2-2Jr| ^ (37) 

with II /II 00 replaced by ||/n(jmax)||oo in ah expressions where ||y^||oo 

occurs. As stated 

before (j29|) . ||/n(imax)||oo concentrates around ||/||oo with exponential error bounds, so 
that the sufficiency part of Theorem [3] then holds for this C„ with slightly increased Za- 

B) Necessity of R < 2r follows immediately from Part B of Theorem [TJ That R < 2r 
is also necessary is proved in Subsection 14.81 below. □ 

4.5 Proof of Theorem [1] 

Proof. That an L^-adaptive confidence set exists when s < 2r follows from Theorem 
[3l The case s < 2r is immediate, and the case s = 2r follows using the confidence 
set ()36p . This set is feasible since, under the hypotheses of Theorem ^ B = Bq is 
known, as is B' and the upper bound for ||/||oo (cf. Q). It is further adaptive since 

^-r/(2r+l/2) ^ ^-s/(2s+l) ^ ^ 2r. 

For part Aii we use the test from Proposition [2] with S = T,{s),t = r, and define 
a confidence ball as follows. Take /„ = fn{jn) to be the adaptive estimator from the 
proof of Theorem [21 and let, for < L' < oo, 

'{/ G S(r) : 11/ - /„,||2 < L'n-^/(2-+i)} if M>„ = 
{/ G S(r) : 11/ - /„||2 < L'n-'-/(2-+i)} if = 1 



Cn 



We first prove that C„ is honest for S(s) U $](r, p„) if we choose L' large enough. For 
/ G S(s) we have from Theorem [21 by Markov's inequality, 

inf Pr^ {/gC„} > 1- sup PrJ||/„-/||2 >L'n-^/(2.+i)| 
/es(s) /eE(s) J 

s/(2s+l) 

> 1 TT- %ll/n-/l|2 



u 

> _ c{B,s,r) 
U 



/es(s) 
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which can be made greater than 1 — a for any a > by choosing L' large enough 
depending only on B,a,r,s. When / G S(r, using again Markov's inequality 

/6S(r,p„) L'n -/(2r-+l) /gS(r,p„) 

The first subtracted term can be made smaller than a/2 for L' large enough as before. 
The second subtracted term can also be made less than a/2 using Proposition [2] and the 
remark preceding it, choosing M and dn to be large but also bounded in n. This proves 
that Cn is honest. We now turn to adaptivity of C„: by the definition of C„ we always 
have \Cn\ < L' 

^_,/(2r+l)^ SO the case / E T,{r,pn) is proved. If / € 5](s) then using 
Proposition [2] again, for M,dn large enough depending on a' but bounded in n, 

Pry{|C„| > L'n-^/(2-+i)} = Vxfi^n = 1} < 

which completes the proof of part A. 

To prove part B of Theorem [1] we argue by contradiction and assume that the limit 
inferior equals zero. We then pass to a subsequence of n for which the limit is zero, 
and still denote this subsequence by n. Let /o = 1 G 5](s), suppose C„ is adaptive and 
honest for S(s) U I](r, p„) for every a, a', and consider testing 

Hq: f = fo against Hi : f £ S(r, p„) 

where pn = o(n~^/(2''+^/2)). Since s > 2r we may assume n"*/^^'^^^) = o{pn) (otherwise 
replace pn by > pn s.t. n^^/('^'^+^^ = o{p'^)). Accept Hq if C„ n T.{r,pn) is empty and 
reject otherwise, formally 

^„ = i{ans(r,p„)/0}. 

The type-one errors of this test satisfy 

Ef.^n = Pr/o{c„nS(r,p„)/0} 

< P17J/0 G Cn, \Cn\ > Pn} + Pr/o{/o ^ Cn} 

< Q + a' + r„ — Q + a' 

as n — )• 00 by the hypothesis of coverage and adaptivity of C„. The type- two errors 
satisfy, by coverage of C„, as n — 00 

Ef{l - = Pr/{C7„ n S(r, p„) = 0} < Pr/{/ ^ C„} < a + r„ ^ a, 

uniformly in / G S(r, /)„). We conclude that this test satisfies 



lim sup 



Ef^'ifn + sup Ef{l - ^n) 

feHi 



<2a + a' 



for arbitrary a, a' > 0. For q, a' small enough this contradicts (the proof of) Theorem 
li in [19], which implies that the limit inferior of the term in brackets in the last display. 
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even with an infimum over all tests, exceeds a fixed positive constant. Indeed, the 
alternatives (6) in [19,] can be taken to be 



Mx) = 1 + e2-^"('-+i/2) ^ Afc^,„fe(x), i = l,..., 22^" , 



for e > a small constant, /3jfc = ±1, and with j„ such that 2-^" ~ ^ Since 



gGS{s) 



for every e > 0, some c > and n large enough, these alternatives are also contained 
in our Hi, so that the proof of the lower bound Theorem li in 19(] applies also in the 
present situation. □ 



4.6 Proof of Theorem [5] 

We shall write for S(s,i?o) and T,n{s) for in this proof, and we write 

Yjn{sN) also for Tj{sn) in slight abuse of notation. For z = 1, . . . , A^, let ^{i) be the test 
from ()17p with S = S(si+i) and t = Si. Starting from the largest model we first test 
Hq : f £ S(s2) against Hi : f £ S„(si), accepting Hq if ^'(1) = 0. If Hq is rejected we 
set Sn = si = r, otherwise we proceed to test Hq : f G ^(53) against Hi : f G Sn(s2) 
using "3/(2) and iterating this procedure downwards we define s„ to be the first element 
Si in S for which ^(i) = 1 rejects. If no rejection occurs we set s„ equal to sn, the last 
element in the grid. 

For / G Vn{M,S) define the unique Sjg : — Sig(f) — {s G 5 : / G We now 

show that for M large enough 

sup Pr/{s„ / Si(,(/)} < max(a,a')/2. (38) 
/e-p„(M,5) 

Indeed, if s„ < Si^ then the test ^'(i) has rejected for some i < iq. In this case / G 
Sri(sio) C 5](sj(,) C for every i < zq, and thus, 

Pr/{5„ < Si,} = Pr/ I U = 1} [ < E ^/*(^) 

< C{N)e'^'^" < max(a,a')/2 

using Proposition [2] and the remark preceding it, choosing M and (i„ to be large but also 
bounded in n. On the other hand if s„ > Sig (ignoring the trivial case Sjp = sn) then 
^(io) has accepted despite / G Sn(si(,). Thus 

Pr/{s„ > s,J < sup Ef{l - ^{io)) < Ce-< < max(a, a')/2 

/eEn(s«o) 

again by Proposition [2l for M, dn large enough. 
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Denote now by Cn{si) the confidence set (|36|) constructed in the proof of Theorem [3] 
with r there being Sj, with R = 2si = Sj+i, with ||/||oo replaced by U and with Za such 
that the asymptotic coverage level is a/2 for any / G Il(sj). We then set C„ = C„(s„), 
which is a feasible confidence set as BQ,r,U are known under the hypotheses of the 
theorem. We then have, from the proof of Theorem [3l uniformly in / S E„(si(j) C S(sj(,), 

Pr/{/ G Cnisn)} > Pr/{/ G - a/2 > 1 - a. 

Moreover, if / G i?) Pi for some 1 < -B < i?o and for either s G [si^, Sio+i) or 

s G [sj\f,R] (in case Sj^ = sat), the expected diameter of Cn satisfies, by the estimates in 
the proof of Theorem [3l 

PTf{\Cn{Sn)\ > Ci?2/(2.+l)^-s/(2s+l)| 

< PTf{\Cn{Si,)\ > C52/(2s+l)^-s/{2«+l)| ^ ^//2 

<a' 

for C large enough, so that this confidence set is adaptive as well, which completes the 
proof. 

4.7 Proof of Theorem [4] 

Proof. Suppose such C„ exists. We will construct functions fm G W^,m = 0, 1, . . . , and 
a further function /oo G W^' , which serve as hypotheses for /. For each m G N, we 
will ensure that, at some time 'i^mi cannot distinguish between fm and /oo, and is 
too small to contain both simultaneously. We will thereby obtain a subsequence rim on 
which, for 5 = -^(1 — 2a), 

supPr/^{/oo G Cn^} <l-a-6, 

m 

contradicting our assumptions on C„. 

For m = 0,l,2,...,oo, construct functions /o = 1, 

m 

i=l k^Zn. 

where e > is a constant, and the parameters ji,j2,... G N, /3jfc = ±1 are chosen 
inductively satisfying ji/ji-i > 1 + l/2r. Pick e > small enough that ||/m — /m-i||oo < 
2-(m+i) £qj, m < oo, and any choice of ji,Pik- Then 

m 

fm = l + Y.{fi- fi-l) > h 
1=1 

and J fm = (1, fm) = 1, so the fm are densities. By dS]), /m £ W"^ , and for m < oo, also 
fm e W. 

We have already defined /o; for convenience let uq = 1. Inductively, suppose we have 
defined /m-i, ?^m-l• For rim > nm-i and D > large enough depending only on fm~i, 
we have: 
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1- Pr/^-il/m-i Cn^] <a + 5; and 
2. Pr/_,{|C„„|>Z)r„„}<<5. 
Setting 

T„ = 1(3 / e Cn, 11/ - /„^_l||2 > 2Drn), 

we then have 

Pr/„_i{7;^ = 1} < Pi7„_,{/™-i + Pr/„_J|C„„J > Z)r„^} < a + 2,5. (39) 

We claim it is possible to choose jm^l^mk and n^, depending only on fm-i so that also: 
1. if m > 1, 

3Dr„,„ < Wfra - fm-lh < \\\fm-l " /m-2||2, (40) 

and 2. for any further choice oi ji^j3ik, 

Pr/^{7^n„=0}>l-a-45. (41) 
We may then conclude that, since all further choices will satisfy ()40p . 



Il/oo - /m-llb > ll/m - /m-llb - ^ " lb > 27:'r„„ 

i=m+l 

Pry^{/oo G C7„„} < Pr/^{T„„ = i}<a + 4<5 = l- a- (5 



as required. 

It remains to verify the claim. For j > {1 + l/2r)jm-i, /3k = ±1, set 



90 = e2-^-('^+V2) ^ ^^^^ 



and fi3 = fm-i + 9(3- Allowing j oo, set 

for C > to be determined. Then 

||3^||2 = e2-^>«n-'-/(2-+i/2), 

so for j large enough, satisfies ()iO]) with any choice of /3. 
The density of Xi, . . . , Xn under w.r.t. under fm~i, is 



n 



i=i •^'"-i 
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Set Z = 2-3 Y.(i Zfi, so Ef^_,[Z] = 1, and 



K^i) 



fm-l 

f(S fl3' 



pji \ \fJrn—\ ■sj fm-l 



\J fm-l \/ fm-l ^ 

< 2-2^- 5^(1 + 2(/3,/3'))" 
= £;[(l + £22i-:'X2'^+i)y)n]^ 



where Y = ^^^i Ri, for i.i.d. Rademacher random variables Ri, 

< £;[exp(n£22i-^(2'-+i)y)] 
= cosh (^D2-^/2(l + o(l)))^' , 



as j ^ oo, for some D > 0, 



2^ 



= (l + L>22-^(l + o(l))) 
<exp(D2(l + o(l))) 
<l + (5^ 

for j large, C small. Hence Ef^_-^[{Z — 1)^] < (5^, and we obtain 

Pr/_i{rn = 1} + maxPr/^{T„ = 0} > Pr/_,{r„ = 1} + 2'^ J] Pr/,{r„ = 0} 

/3 

= 1+^/^-1 [(^-l)l(T^n=0)] 
> 1 - (5. 

Set fm = fi3, for maximizing this expression. The density of Xi, . . . , X„ under /oo, 
w.r.t. under Z^, is 



Now, S/^[Z'] = 1, and 



||/oo-/m||2= J2 e^2-^^'^ < E'2-^^r.+ir ^ ^>2-^^2r+l)^ 
i=m+l 
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for some constant E' > 0, so similarly 

EfjZ'']<{l + 2\\U-U\lr 
< (1 + ^'2i-J'(2^+i))" 

= exp (^F2-^/\l + o{l))^ , 

for some F > 0, 

<l + 6\ 

for j large. Hence Ef^[{Z' — 1)^] < 6'^, and 

P^fm-ATn = 1} + Pr/^{T„ = 0} = Pr/^_,{T„ = 1} + i?/„[Z'l(r„ = 0)] 

>l-6 + EfJ{Z'-l)l{Tn = 0)] 
> 1 - 25. 

If we take jm = j, nm = n large enough also that ([39|) holds, then /oo satisfies ([^T]) . and 
our claim is proved. □ 

4.8 Proof of Part B of Theorem H 

Proof. Suppose such C„ exists for R = 2r. Set /o = 1, and 

for B > 0, j > Jo, and /3jfc = ±1 to be determined. Having chosen B, we will pick j 
large enough that /i > ^. Since / /i = (/i, 1) = 1, /i is then a density. 
Set 6 = j{l — 2a). As /o G 1), for n and L large we have: 

1. Pr/J/o C„,} < a + <5; and 

2. Pr/J|C„| > Ln-^/(2i?+i)| < j_ 

Setting r„ = 1(3 / G C„ : 11/ - /0II2 > 2Ln-^/(2^+i)), we then have 

Pr/jr„ = l}<a + 25, 

as in the proof of Theorem HI 

For a constant C = C{6) > to be determined, set B = (3L)^^+^C~^. Allowing 
j 00, set n ~ C7S-22J(^+i/2). Then 

||/i-/o|U = i?2-^'^=.3Ln-«/(2«+i), 
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so for j large, ||/i — /0II2 > 2Ln ^/(2fi+i)^ Arguing as in the proof of Theorem [H the 
density Z of /i w.r.t. /o has second moment 

i?/JZ2] < C0sh(ni?22l-J-(2r+l))2^- 

= cosh{C2'^-^/\l + o{l))f 
= {l + C^2^-^il + oil))f 
<exp(4C2(l + o(l))) 
< l + <5^ 

for C{6) small, j large. Hence 

Pr/JT„ = 1} + maxPr/jT„ = 0} > 1 - (5. 

and for all j (and n) large enough, we obtain, for suitable /3, 

Pif.ih G Cn} < Pi-fATn = l}<a + 3d=l-a-6. 
Since /i G $](r, for all n,f3jk this contradicts the definition of Cn- O 
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